Methods for the Classification of Data from Open-Ended Questions in Surveys

Disputation
16 April 2024

Camille Landesvatter

University of Mannheim

Research Objective

“[…] introducing various methods of classifying data from open-ended survey questions and empirically illustrating their application.
A central research question addressed in this thesis therefore concerns the analysis of (short) text data generated by open-ended survey questions.” (Landesvatter 2023, p.2)

Terminology: Open-Ended Questions in Surveys

  • “survey questions that do not include a set of response options” (Züll, 2016, p. 1)

  • “require respondents to formulate a response in their own words and to express it verbally or in writing” (Züll, 2016, p. 1)

  • ≠ closed-ended questions with answer categories presented in a closed form (Inui et al., 2001, p. 1)

Strategy

“[…] introducing various methods of classifying data from open-ended survey questions and empirically illustrating their application. A central research question addressed in this thesis therefore concerns the analysis of (short) text data generated by open-ended survey questions.”

  1. introducing readers to the survey methodology of using open-ended questions
    • including historical and modern developments, characteristics and challenges of open-ended questions, types of OEQs (e.g., probing)
  2. introducing readers to computational methods available for analysis of open-ended answers
    • manual, semi-automated, fully automated
  3. applying various of these available methods in three empirical studies

Methods for Analyzing Data from Open-Ended Questions

Table 1. Overview of methods for classifying open-ended survey responses

Motivation

➡️ The increase in methods to collect open-ended answers (e.g. smartphone-administered surveys, voice technologies, novel methods) calls for testing and validating automated methods to analyze the resulting data

❓ Why did I chose the Survey Context?

❓ Why do I focus on computational methods?

Motivation: Why Survey Context?

  • data from OEQs represent a special and intriguing type of data for ML applications due to their short, concise and low in context textual data
  • this can require the use of suitable methods, e.g., word embeddings, structural topic models

Figure 1: The previous question was: ‘How often can you trust the federal government in Washington to do what is right?’. Your answer was: ‘[Always; Most of the time; About half of the time; Some of the time; Never; Don’t Know]’. In your own words, please explain why you selected this answer.

Motivation: Why Computational Methods?

  • fully manual methods require high ressources (time and effort)

  • but more importantly, human codings can

    • be biased (Mosca et al., 2022),
    • lack objectivity (Inui et al., 2001),
    • introduce errors when coders misinterpret answers or annotation codes (Giorgetti & Sebastiani, 2003),
    • face transparency issues related to unitization and intercoder reliability (Campbell et al., 2013).
  • automated methods offer objectivity and systematicness (Zhang et al., 2022)

  • still, issues persist (e.g., transparency) which makes it crucial to test and evaluate methods for the social sciences

Empirical Contributions

Empirical Contributions: Overview

1 2 3
How valid are trust survey measures? New insights from open-ended probing data and supervised machine learning Open-ended survey questions: A comparison of information content in text and audio response format Asking Why: Is there an Affective Component of Political Trust Ratings in Surveys?
  • data collection approach: three self-administered web surveys with open-ended questions
  • data from three U.S. non-probability samples
  • methodology for text classification: supervised ML, unsupervised ML, fine-tuning of pre-trained language model BERT, zero-shot learning

How valid are trust survey measures? New insights from open-ended probing data and supervised machine learning

Co-authored by: Dr. Paul C. Bauer

Published In: Landesvatter, C., & Bauer, P. C. (2024). How Valid Are Trust Survey Measures? New Insights From Open-Ended Probing Data and Supervised Machine Learning. Sociological Methods & Research, 0(0). https://doi.org/10.1177/00491241241234871

The validity of trust survey measures: Background

  • Background:
    • ongoing debates about which type of trust survey researchers are measuring with traditional survey items (i.e., equivalence debate cf. Bauer & Freitag 2018)
  • Research Question:
    • How valid are traditional trust survey measures?
  • Experimental Design:
    • block randomized question order where closed-ended questions are followed by open-ended follow-up probing questions

The validity of trust survey measures: Methodology

  • Operationalization via two classifications: share of known vs. unknown others in associations (I), sentiment (pos-neu-neg) of assocations (II)
  • Supervised classification approach:
      1. manual labeling of randomly sampled documents (n=[1,000,1,500])
      1. fine-tuning the weights of two BERT1 models (base model uncased version), using the manually coded data as training data, to classify the remaining=[6,500/6,000]
    • accuracy2: 87% (I) and 95% (II)

The validity of trust survey measures: Results

Paper 1 Results
Figure 1: Illustration of exemplary data.
Figure 2: Associations and trust scores across different measures.

Open-ended survey questions: A comparison of information content in text and audio response formats

Co-authored by: Dr. Paul C. Bauer

Submitted to: Public Opinion Quarterly in February 2024

Information content Text vs. Audio Responses: Background

  • Background:
    • recent increase of voice-based response options in surveys due to mobile devices equipped with voice input technologies, smartphone surveys and speech-to-text technologies
  • Research Question:
    • Are there differences in information content between responses given in voice and text formats?
  • Experimental Design:
    • block randomized question order with open-ended and probing questions
    • random assignment into either the text or voice condition

Information content Text vs. Audio Responses: Methodology

  • Operationalization via application of measures from information theory and machine learning to classify open-ended survey answers
    • number of topics, response entropy
    • plus, response length

Information content Text vs. Audio Responses: Results

Figure 3: Information Content Measures across questions.

Asking Why: Is there an Affective Component of Political Trust Ratings in Surveys?

Co-authored by: Dr. Paul C. Bauer

Submitted to: American Political Science Review in March 2024

Affective Components in Political Trust: Background

  • Background:
    • conventional notion stating tha trust originates from informed, rational, and consequential judgments is challenged by the idea of an “affective-based” form of (political) trust
  • Research Question:
    • Are individual trust judgments in surveys driven by affective rationales?
  • Questionnaire Design:
    • closed-ended political trust question followed by open-ended probing question

Affective Components in Political Trust: Methodology

  • Operationalization via sentiment and emotion analysis

  • Transcript-based

    • pysentimiento for sentiment recognition (Pérez et al. 2023)
    • zero-shot prompting with GPT-3.5
  • Speech-based

    • SpeechBrain for Speech Emotion Recognition (Ravanelli et al. 2021)

Affective Components in Political Trust: Results

Figure 6: Results from Speech Emotion Recognition.

Summary & Discussion

  • web surveys can be used to collect narrative answers that provide valuable insights into survey responses

  • various modern developments (smartphone surveys, speech-to-text algorithms) can be leveraged to collect such data in innovative ways (e.g., spoken answers)

    • always consider challenges and objectives (i.e., in term of sample sizes and sample compositions)
  • computational measures can be applied to classify open-ended answers from surveys in order to inform ongoing debates in different fields, e.g.:

    • equivalence debate in trust research (Study 1), cognitive-versus-affective debate in political trust research (Study 3)
    • survey questionnaire design (Study 2) or item and data quality in general (e.g., associations, sentiment) (Study 1-3)

Some observations and conclusions

Facilitated accessibility and implementation of semi-automated methods.
  • supervised models have been a standard in automated methods, but recent developments of large and general-aim pre-trained models (e.g., BERT) allow less resource-intensive fine-tuning

  • For example, using only ~13% (1,000 documents from 7,500 in Study 1) documents for fine-tuning resulted in sufficient accuracy (i.e., 87%)

    • increasing the number of manually labeled documents can help in terms of accuracy (i.e., 92% in Study 1)
      • higher number of manual examples also improves the transparency of results: accuracy vs. transparency trade-off
        • start with simple methods and evaluate (e.g., Study 1, first Random Forest, only later BERT)

Some observations and conclusions

Increase in possibilities of fully automated methods (e.g., prompt engineering.
  • fully automated methods, such as zero-shot prompting can keep up with fine-tuned versions of pre-trained models (e.g., pysentimiento, Study 2)
    • deciding on a suitable number of manual examples and for a method in general (e.g., fully automated (unsupervised) versus semi-automated (supervised, and finetuning) depends on resources such as expected difficulty, desired accuracy, available time and cost resources

Thank you for your Attention!

References